Consistency of sparse PCA in High Dimension, Low Sample Size contexts

نویسندگان

  • Dan Shen
  • Haipeng Shen
  • J. S. Marron
چکیده

Sparse Principal Component Analysis (PCA) methods are efficient tools to reduce the dimension (or number of variables) of complex data. Sparse principal components (PCs) are easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension, Low Sample Size (HDLSS)). We consider the previously studied single spike covariance model and assume in addition that the maximal eigenvector is sparse. We extend the existing HDLSS asymptotic consistency and strong inconsistency results of conventional PCA in an entirely new direction. We find a large set of sparsity assumptions under which sparse PCA is still consistent even when conventional PCA is strongly inconsistent. The consistency of sparse PCA is characterized along with rates of convergence. Furthermore, we clearly identify the mathematical boundaries of the sparse PCA consistency, by showing strong inconsistency for an oracle version of sparse PCA beyond the consistent region, as well as its inconsistency on the boundaries of the consistent region. Simulation studies are performed to validate the asymptotic results in finite samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discussion of large covariance estimation by thresholding prin- cipal orthogonal complements

We congratulate the authors on a very interesting contribution, which takes the fundamentally important field of covariance matrix estimation in some important new directions. We agree that now is a good time to be studying asymptotic contexts, where the first K eigenvalues of Σ grow quickly. The asymptotic mode of the sample size tending to infinity, with an exponentially growing dimension can...

متن کامل

Pca Consistency in High Dimension , Low Sample Size Context

Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows (i.e. High Dimension, Low Sample Size (HDLSS)) are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLS...

متن کامل

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data

MOTIVATION Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of ...

متن کامل

Gene expression Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data

Motivation: Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of...

متن کامل

Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds

SUNGKYU JUNG: Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds. (Under the direction of Dr. J. S. Marron.) The dissertation consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are rela...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Multivariate Analysis

دوره 115  شماره 

صفحات  -

تاریخ انتشار 2013